arena: rename inner-SIMD-align knob and drop default 64 -> 32#552
Merged
Merged
Conversation
…4 -> 32 Rename the in-arena element-storage alignment knob to TA_ARENATENSOR_SIMD_ALIGN (matching the TA_ prefix used by every other TA CMake option/var), and wire it through the same CMake -> config.h.in -> header pipeline as TA_MAX_SOO_RANK_METADATA. The previous TILEDARRAY_INNER_SIMD_ALIGN was a header-only #ifndef/#define knob with no CMake surface; the new form is a proper cache variable, documented in INSTALL.md. Drop the default from 64 B to 32 B: 32 B covers AVX2 YMM loads/stores (the most common x86_64 SIMD target today) and shaves 32 B/cell off the in-arena padding. AVX-512 builds that want a wider floor are one `-DTA_ARENATENSOR_SIMD_ALIGN=64` away. The doc comment and INSTALL.md entry also call out the NEON / Apple-Silicon options (16 / 128). No backward-compatible alias for the old macro/constant names -- there are no external users yet.
cee791e to
3da31da
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related changes to the ArenaTensor in-cell alignment knob in
src/TiledArray/tensor/arena_tensor.h:TILEDARRAY_INNER_SIMD_ALIGN→TILEDARRAY_ARENATENSOR_SIMD_ALIGN(andkInnerSimdAlign→kArenaTensorSimdAlign), so the knob's name reflects the type whose layout it parametrizes. Hard cut — no compat alias, since there are no external users yet.data_offsetper inner cell. AVX-512 builds that want the wider floor stay one-DTILEDARRAY_ARENATENSOR_SIMD_ALIGN=64away.Why this matters: each
ArenaTensorcell pads fromsizeof(Cell)(~14 B forbtas::zb::RangeNd<>) up to this alignment before its element storage, so per-inner-cell bookkeeping isdata_offset + 8 B view ptr. On a ToT tile with millions of inner cells (e.g. PNO-CCSD), the difference between 32 B and 64 B padding is order ~100s of MB of memory.The doc comment now spells out the reasonable overrides:
Test plan
arena_suite,arena_kernels_suite,arena_einsum_unit_suite,arena_tot_trivial_suite,arena_sizeof_invariant_suite,arena_tensor_suite,arena_tensor_kernels_suiteall pass against the new default (np=1, debug build,TA_ASSERT_POLICY=TA_ASSERT_THROW).